Production dataΒΆ

Graphical explorationΒΆ

Predictions units all follow similar day-night and seasonal cyclesΒΆ

This is reasonable for solar power.

Some unit show obvious examples of production being addedΒΆ

Normalizing to installed capacityΒΆ

No description has been provided for this image

The installed capacity on any given day is an upper limit on production.

No description has been provided for this image

When new capacity is installed between the date that we are predicting and the date when the data is measured, this throws off the normalization.

This will be an problem for prediction, but not for training. Correcting it would require predicting when new capacity will be installed.

Direction: These changes in production offer an opportunityΒΆ

We can test how good the model of production as a fraction of capacity is: centering on changes, there should be no statistical difference between production as a fraction of installed capacity before and after the new installation

First, however, we need to know the distribution of production values

Histograms of capacity-normalized dataΒΆ

No description has been provided for this image

The data is dominated by small values due to nights and winters

No description has been provided for this image

Winter and weather mean that small values dominate, even at noon.

No description has been provided for this image

If we limit ourselves to the summer (June-August) at noon, we see a very different distribution of production values. This time it is more flat, and peaks at a large value.

This represents a challenge for modelling the data:ΒΆ

The impact of the noise (ie. weather) depends on the context of other seasonal features. Any model that doesn't take this into account will fail to properly model the noise

  • Direction: One option to address this is to run a regression against the parameters of a beta distribution. This would capture the distribution of outputs that we expect, and could be useful for future models.

Regression against a single production unitΒΆ

Linear modelΒΆ

Using join time x year features

No description has been provided for this image

The fits don't look ok.ΒΆ

  • The residual plot show strong heteroskedasticity, as we expect from the histogram analysis
  • The error shows time (magnitude) dependency.

linear model of fold changesΒΆ

No description has been provided for this image

Linear vs logarithmic doesn't really matter. The differences in the models are miniscule.

Note: the heteroskedasicity is still present.

model with autocorrelationΒΆ

The above residuals show strong autocorrelation

No description has been provided for this image
No description has been provided for this image

Removing the 48hr autocorrelation show a fairly negligible impact.

Overall, the linear model seems quite poorly suited to this data.ΒΆ

It may still be helpful as a starting point for models.